Mono Versus Multilingual BERT: A Case Study in Hindi and Marathi Named Entity Recognition

نویسندگان

چکیده

Named entity recognition (NER) is the process of recognizing and classifying important information (entities) in text. Proper nouns, such as a person’s name, an organization’s or location’s are examples entities. The NER one modules applications like human resources, customer support, search engines, content classification, academia. In this work, we consider for low-resource Indian languages Hindi Marathi. transformer-based models have been widely used tasks. We different variations BERT base-BERT, RoBERTa, AlBERT benchmark them on publicly available Marathi datasets. provide exhaustive comparison monolingual multilingual establish simple baselines currently missing literature. show that MahaRoBERTa model performs best whereas XLM-RoBERTa NER. also perform cross-language evaluation present mixed observations.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multilingual Named-Entity Recognition from Parallel Corpora

We present a named-entity recognition (NER) system for parallel multilingual text. Our system handles three languages (i.e., English, French, and Spanish) and is tailored to the biomedical domain. For each language, we design a supervised knowledge-based CRF model with rich biomedical and general domain information. We use the sentence alignment of the parallel corpora, the word alignment gener...

متن کامل

POLYGLOT-NER: Massive Multilingual Named Entity Recognition

The increasing diversity of languages used on the web introduces a new level of complexity to Information Retrieval (IR) systems. We can no longer assume that textual content is written in one language or even the same language family. In this paper, we demonstrate how to build massive multilingual annotators with minimal human expertise and intervention. We describe a system that builds Named ...

متن کامل

Learning multilingual named entity recognition from Wikipedia

We automatically create enormous, free and multilingual silver-standard training annotations for named entity recognition (ner) by exploiting the text and structure of Wikipedia. Most ner systems rely on statistical models of annotated data to identify and classify names of people, locations and organisations in text. This dependence on expensive annotation is the knowledge bottleneck our work ...

متن کامل

Invited Talk: Multilingual Named Entity Recognition

The computational research aiming at automatically identifying named entities (NE) in texts forms a vast and heterogeneous pool of strategies, techniques and representations from hand-crafted rules towards machine learning approaches. Hand-crafted rule based systems provide good performance at a relatively high system engineering cost. The availability of a large collection of annotated data is...

متن کامل

The Multilingual Named Entity Recognition Framework

This paper presents a multilingual system designed to recognize named entities in a wide variety of languages (currently more than 12 languages are concerned). The system includes original strategies to deal with a wide variety of encoding character sets, analysis strategies and algorithms to process these languages.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture notes in networks and systems

سال: 2023

ISSN: ['2367-3370', '2367-3389']

DOI: https://doi.org/10.1007/978-981-19-6088-8_56